SQL Server Integration Services : Logged and Nonlogged Operations

11/30/2010 3:15:09 PM

Bulk-copy operations can occur in two modes: logged and nonlogged (also known as slow and fast bcp, respectively). The ideal situation is to operate in nonlogged mode because this arrangement dramatically decreases the load time and consumption of other system resources, such as memory, processor use, and disk access. However, the default runs the load in logged mode, which causes the log to grow rapidly for large volumes of data.

To achieve a nonlogged operation, the target table must not be replicated (the replication log reader needs the log records to relay the changes made). The database holding the target table must also have its SELECT INTO/BULK COPY option set, and finally, the TABLOCK hint must be specified.

Note

Remember that setting the SELECT INTO/BULK COPY option disables the capability to back up the transaction log until a full database backup has been performed. Transaction log dumps are disabled because if the database had to be restored, the transaction log would not contain a record of the new data.

Although you can still perform fast loads against tables that have indexes, it is advisable to drop and re-create the indexes after the data transfer operation is complete. In other words, the total load time includes the loading of the data and index creation time. If there is existing data in the table, the operation is logged; you achieve a nonlogged operation only if the table is initially empty.

Generally, you get at least a 50% drop in transfer speed if the table has an index. The more indexes, the greater the performance degradation. This is due to the logging factor: more log records are being generated, and index pages are being loaded into the cache and modified. This can also cause the log to grow, possibly filling it (depending on the log file settings).

Note

Despite the name, even a nonlogged operation logs some things. In the case of indexes, index page changes and allocations are logged, but the main area of logging is of extent allocations every time the table is extended for additional storage space for the new rows.

Batches

By default, bcp puts all the rows that are inserted into the target table into a single transaction. bcp calls this a batch. This arrangement reduces the amount of work the log must deal with; however, it locks down the transaction log by keeping a large part of it active, which can make truncating or backing up the transaction log impossible or unproductive. By using the bcp batch (–b) switch, you can control the number of rows in each batch (or, effectively, each transaction). This switch controls the frequency of commits; although it can increase the activity in the log, it enables you to trim the size of the transaction log. You should tune the batch size in relation to the size of the data rows, transaction log size, and total number of rows to be loaded. The value you use for one load might not necessarily be the right value for all other loads.

Note that if a subsequent batch fails, the prior batches are committed, and those rows become part of the table. However, any rows copied up to the point of failure in the failing batch are rolled back.

Parallel Loading

A great enhancement of bcp is that you can now use it to do parallel loads of tables. If you want to take advantage of this feature, the following must be true:

The bulk-copy operation must be nonlogged; all requirements specified in the previous discussion on nonlogged operations must be met.
There must be no indexes on the target table.

Only applications using the ODBC or SQL OLE DB–based APIs can perform parallel data loads into a single table.

The procedure is straightforward. After you ascertain that the target table has no indexes (which could involve dropping primary or unique constraints) and is not being replicated, you must set the database option SELECT INTO/BULK COPY to true. The requirement to drop all indexes has to do with the locking that must occur to load the data. Although the table itself can have a shared lock, the index pages are an area of contention that prevents parallel access.

Now all that is required is to set up the parallel bcp loads to load the data into the table. You can use the –F and –L switches to specify the range of the data you want each parallel bcp to load into the table if you are using the same data file. Using these switches removes the need to manually break up the file. Here is an example of the command switches involved for a parallel load with bcp for the customers table:

bcp AdventureWorks2008.Sales.SalesOrderHeader IN SalesOrders10000.dat –T
  –S servername –c –F 1
–L 10000 –h "TABLOCK"

bcp AdventureWorks2008.Sales.SalesOrderHeader IN SalesOrders20000.dat –T
  –S servername –c –F 10001
–L 20000 –h "TABLOCK"

The TABLOCK hint (–h switch) provides improved performance by removing contention from other users while the load takes place. If you do not use the hint, the load takes place using row-level locks, and this is considerably slower.

SQL Server 2008 allows parallel loads without affecting performance by making each bcp connection create extents in nonoverlapping ranges. The ranges are then linked into the table’s page chain.

After the table is loaded, it is also possible to create multiple nonclustered indexes in parallel. If there is a clustered index, you work with that one first, followed by the parallel nonclustered index.

Supplying Hints to bcp

The SQL Server 2008 version of bcp enables you to further control the speed of data loading, to invoke constraints, and to have insert triggers fired during loads. To take advantage of these capabilities, you use hint switches to specify one or more hints at a time. Following is the syntax:

–h "hint [, hint]"

This option cannot be used when bulk-copying data into versions of SQL Server before version 7.0 because, starting with SQL Server 7.0, bcp works in conjunction with the query processor. The query processor optimizes data loads and unloads for OLE database rowsets that the latest versions of bcp and BULK INSERT can generate.

The following sections describe the various hints you can specify with the –h switch.

The ROWS_PER_BATCH Hint

The ROWS_PER_BATCH hint is used to tell SQL Server the total number of rows in the data file. This hint helps SQL Server optimize the entire load operation. This hint and the –b switch heavily influence the logging operations that occur with data inserts. If you specify both this hint and the –b switch, they must have the same values, or you get an error message.

When you use the ROWS_PER_BATCH hint, you copy the entire result set as a single transaction. SQL Server automatically optimizes the load operation, using the batch size you specify. The value you specify does not have to be accurate, but you should be aware of the practical limit, based on the database’s transaction log.

Tip

Do not be confused by the name of the ROWS_PER_BATCH hint. You are specifying the total file size and not the batch size (as is the case with the –b switch).

The CHECK_CONSTRAINTS Hint

The CHECK_CONSTRAINTS hint controls whether check constraints are executed as part of the bcp operation. With bcp, the default is that check constraints are not executed. This hint option allows you to turn the feature on (to have check constraints executed for each insert). If you do not use this option, you should either be very sure of your data or rerun the same logic as in the check constraints you deferred after the data has been loaded.

The FIRE_TRIGGER Hint

The FIRE_TRIGGER hint controls whether the insert trigger on the target table is executed as part of the bcp operation. With bcp, the default is that no triggers are executed. This hint option allows you to turn the feature on (to have insert triggers executed for each insert). As you can imagine, when this option is used, it slows down the bcp load operation. However, the business reasons to have the insert trigger fired might outweigh the slower loading.

The ORDER Hint

If the data you want to load is already in the same sequence as the clustered index on the receiving table, you can use the ORDER hint. The syntax for this hint is as follows:

ORDER( {column [ASC | DESC] [,...n]})

There must be a clustered index on the same columns, in the same key sequence as specified in the ORDER hint. Using a sorted data file (in the same order as the clustering index) helps SQL Server place the data into the table with minimal overhead.

The KILOBYTES_PER_BATCH Hint

The KILOBYTES_PER_BATCH hint gives the size, in kilobytes, of the data in each batch. This is an estimate that SQL Server uses internally to optimize the data load and logging areas of the bcp operation.

The TABLOCK Hint

The TABLOCK hint is used to place a table-level lock for the bcp load duration. This hint gives you increased performance at a loss of concurrency, as described in the section “Parallel Loading,” earlier in this chapter.